11 research outputs found
Best bang for your buck: GPU nodes for GROMACS biomolecular simulations
The molecular dynamics simulation package GROMACS runs efficiently on a wide
variety of hardware from commodity workstations to high performance computing
clusters. Hardware features are well exploited with a combination of SIMD,
multi-threading, and MPI-based SPMD/MPMD parallelism, while GPUs can be used as
accelerators to compute interactions offloaded from the CPU. Here we evaluate
which hardware produces trajectories with GROMACS 4.6 or 5.0 in the most
economical way. We have assembled and benchmarked compute nodes with various
CPU/GPU combinations to identify optimal compositions in terms of raw
trajectory production rate, performance-to-price ratio, energy efficiency, and
several other criteria. Though hardware prices are naturally subject to trends
and fluctuations, general tendencies are clearly visible. Adding any type of
GPU significantly boosts a node's simulation performance. For inexpensive
consumer-class GPUs this improvement equally reflects in the
performance-to-price ratio. Although memory issues in consumer-class GPUs could
pass unnoticed since these cards do not support ECC memory, unreliable GPUs can
be sorted out with memory checking tools. Apart from the obvious determinants
for cost-efficiency like hardware expenses and raw performance, the energy
consumption of a node is a major cost factor. Over the typical hardware
lifetime until replacement of a few years, the costs for electrical power and
cooling can become larger than the costs of the hardware itself. Taking that
into account, nodes with a well-balanced ratio of CPU and consumer-class GPU
resources produce the maximum amount of GROMACS trajectory over their lifetime
More Bang for Your Buck: Improved use of GPU Nodes for GROMACS 2018
We identify hardware that is optimal to produce molecular dynamics
trajectories on Linux compute clusters with the GROMACS 2018 simulation
package. Therefore, we benchmark the GROMACS performance on a diverse set of
compute nodes and relate it to the costs of the nodes, which may include their
lifetime costs for energy and cooling. In agreement with our earlier
investigation using GROMACS 4.6 on hardware of 2014, the performance to price
ratio of consumer GPU nodes is considerably higher than that of CPU nodes.
However, with GROMACS 2018, the optimal CPU to GPU processing power balance has
shifted even more towards the GPU. Hence, nodes optimized for GROMACS 2018 and
later versions enable a significantly higher performance to price ratio than
nodes optimized for older GROMACS versions. Moreover, the shift towards GPU
processing allows to cheaply upgrade old nodes with recent GPUs, yielding
essentially the same performance as comparable brand-new hardware.Comment: 41 pages, 13 figures, 4 tables. This updated version includes the
following improvements: - most notably, added benchmarks for two coarse grain
MARTINI systems VES and BIG, resulting in a new Figure 13 - fixed typos -
made text clearer in some places - added two more benchmarks for MEM and RIB
systems (E3-1240v6 + RTX 2080 / 2080Ti
GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers
AbstractGROMACS is one of the most widely used open-source and free software codes in chemistry, used primarily for dynamical simulations of biomolecules. It provides a rich set of calculation types, preparation and analysis tools. Several advanced techniques for free-energy calculations are supported. In version 5, it reaches new performance heights, through several new and enhanced parallelization algorithms. These work on every level; SIMD registers inside cores, multithreading, heterogeneous CPU–GPU acceleration, state-of-the-art 3D domain decomposition, and ensemble-level parallelization through built-in replica exchange and the separate Copernicus framework. The latest best-in-class compressed trajectory storage format is supported
Highly Tuned Small Matrix Multiplications Applied to Spectral Element Code Nek5000
Proceedings of: Third International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2016). Sofia (Bulgaria), October, 6-7, 2016.Nek5000 is an open-source code for simulating incompressible flows using MPI for parallel communication. In the Nek5000
code, the tensor-product-based operator evaluation can be implemented as small dense matrix-matrix multiplications. It is clear
that the routines for calculating the matrix-matrix product dominate the execution time of Nek5000. In this paper, we conduct
the optimization of matrix-matrix multiplication using SIMD intrinsics and the LIBXSMM package. The evaluation of the
computational cost and optimization of these subroutines is not only applied to the CFD code Nek5000, but also to the NekCEM
and NekLEM software, which share same data structures with Nek5000
GROMACS 4.6 heterogenous CPU-GPU acceleration
<p>Control and data-flow of the heterogeneus parallelization in GROMACS 4.6. The diagram illustrates both normal MD steps (black lines) as well as those steps in which the pair-search and domain-decomposition is carried out (blue). In the latter, the additional transfer of the pair list from the CPU to the GPU and a subsequent pruning done in the CUDA kernel is also indicated.</p
Intro to HPC for Life Scientists: Mapping computation to HPC hardware & GPU accelerators and heterogeneous architectures
Lecture slides and exercises for BioExcel-PerMedCoE Introduction to HPC for Life Scientists.
Mapping computation to HPC hardware: molecular simulation (lecture)
GPU accelerators and heterogeneous architectures (lecture)
 Introduction to HPC: molecular dynamics simulations with GROMACS (exercises)
Barcelona, March 8 2023</p
GROMACS 5.1 vs 2016 performance in GPU-accelerated ceramide pull simulations
<p>Computational
modeling of the skin barrier, the lipid matrix of the stratum
corneum, using molecular dynamics simulations. These studies give further insight about the largest organ in the
human body and will further clinical experiments. Thanks to the
highly optimized heterogeneous parallelization in GROMACS, complex
computational studies can be carried out quickly and efficiently. The
right panel shows the modeled molecular system with ceramide
molecules in green, cholesterols in white, and fatty acids in red.</p>
<p>Recent
work focused on improving SIMD, GPU, and thread
parallelization and resulted in speeding up the aforementioned
calculations<b>
</b>by
up to 50%. The left panel illustrates the execution time breakdown
of the CPU and GPU tasks in GROMACS versions 5.1 and 2016; improved
performance of multiple tasks leads to an increase in simulation
throughput from 61 ns/day to 95 ns/day.</p>
<p>The
simulations were performed on a workstation equipped with a Core
i7-5960X CPU and a GeForce TITAN X GPU.</p><p>Benchmarks and performance illustration (left panel) by S.P., the ceramide system model and rendering by M.L.</p